A Continuous-time Markov Decision Process Based Method on Pursuit-Evasion Problem
نویسنده
چکیده
This paper presents a method to address the pursuit-evasion problem which incorporates the behaviors of the opponent, in which a continuous-time Markov decision process (CTMDP) model is introduced, where the significant difference from Markov decision process (MDP) is that the influence of the transition time between the states is taken into account. By introducing the concept of situation, the probabilities addressing average behaviors are obtained. Furthermore, these probabilities are introduced to construct the transition matrix in the CTMDP. A policy iteration method for solving the CTMDP is also given. To demonstrate the CTMDP method for pursuit-evasion, examples in a grid environment are computed. The CTMDP-based method presented in this paper offers a new approach to pursuit-evasion modeling and may be extended to similar problems in the sequential decision process.
منابع مشابه
Dynamic Programming for One-sided Partially Observable Pursuit-evasion Games
We study two player pursuit-evasion games with concurrent moves, infinite horizon, and discounted rewards. The players have partial observability, however, the evader is given an advantage of knowing the current position of the units of the pursuer. We show that (1) value functions of this game depend only on the position of the pursuing units and the belief the pursuer has about the position o...
متن کاملA Model-Based Approach to Optimizing Ms. Pac-Man Game Strategies in Real Time
This paper presents a model-based approach for computing real-time optimal decision strategies in the pursuitevasion game of Ms. Pac-Man. The game of Ms. Pac-Man is an excellent benchmark problem of pursuit-evasion game with multiple, active adversaries that adapt their pursuit policies based on Ms. Pac-Man’s state and decisions. In addition to evading the adversaries, the agent must pursue mul...
متن کاملOperation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm
: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...
متن کاملA Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games
In this work, we propose a new fuzzy reinforcement learning algorithm for differential games that have continuous state and action spaces. The proposed algorithm uses function approximation systems whose parameters are updated differently from the updating mechanisms used in the algorithms proposed in the literature. Unlike the algorithms presented in the literature which use the direct algorit...
متن کاملThe temporal derivative of expected utility: A neural mechanism for dynamic decision-making
Real world tasks involving moving targets, such as driving a vehicle, are performed based on continuous decisions thought to depend upon the temporal derivative of the expected utility (∂V/∂t), where the expected utility (V) is the effective value of a future reward. However, the neural mechanisms that underlie dynamic decision-making are not well understood. This study investigates human neura...
متن کامل